This module covers the diversity of data types, sources, and methods of data collection, focusing on data wrangling processes that transform raw data into usable formats by eliminating redundancies through cleansing and normalization. The module explores big data, technologies, and data management techniques to address security risks and challenges in handling diverse datasets. It also delves into data storage formats, collection methods, and data analysis in organizational contexts, with forward-looking discussions on machine learning applications in big data.

 

Learning Outcomes

I will be able to:

  • Identify and manage challenges, security issues, risks, limitations, and opportunities in data wrangling.
  • Critically analyze data wrangling problems and apply appropriate tools and methodologies for preparing, cleaning, exploring, and evaluating big data.
  • Design, develop, and evaluate solutions for processing datasets and solving complex problems using relevant programming paradigms.
  • Develop and apply professional skills to be an effective team member in a virtual, real-world environment.

Artefacts and Feedback

Click bold text to open up project

  • Collaborative Discussion 1 Summary: In this project, I critically evaluated the rationale behind the Internet of Things (IoT) in the context of the article by Huxley et al. (2020), highlighting the opportunities, risks, and challenges associated with large-scale data collection. I also explored the importance of data wrangling in managing IoT-generated big data.
  •  

    My initial post was praised for highlighting how IoT contributes to predictive analytics in fleet management, particularly in sectors like transportation. Peer feedback focused on the challenges of managing IoT data, emphasizing the need for advanced architectures like Lambda and Kappa. This feedback helped me refine my approach to discussing security and data validation concerns in IoT integration.

     
  • Mathematics Test Summary: A test on Python concepts, librarys and practices. Although I scored 94%, I realized that some terminologies were unfamiliar to me, signaling the importance of continuous learning beyond my current work experience.
  •  

    The high score reaffirmed my technical abilities, but my reflection emphasized the need to stay updated on modern terminologies. I now make an effort to keep learning beyond my day-to-day role by engaging with industry trends and participating in professional development courses.

     
  • Team Project Summary:This group project involved designing a logical database for a ride-hailing company. I contributed to the database build proposal, focusing on key aspects such as scalability, data cleaning, and security. I used PostgreSQL as the DBMS and designed a data management pipeline to process large volumes of trip data.
  •  

    The feedback indicated that while the report demonstrated good knowledge of database design, it lacked critical evaluation, particularly in comparing alternative systems. We were encouraged to enhance our referencing and ensure that every recommendation was backed by a well-reasoned analysis. The team was disappointed by the grade of 62%, but we learned the importance of critical evaluation.

     
  • Individual Executive Summary:This project involved summarizing the work from Unit 6. I incorporated feedback by improving the critical analysis of the pros and cons of SQL vs NoSQL databases, focusing on MongoDB for its flexibility in handling unstructured data. I also integrated GDPR compliance measures into the database design.
  •  

    Though I haven't received formal feedback yet, I believe the improvements I made to the integrity of the project, especially regarding critical evaluation and increased referencing have significantly enhanced the final report.

       

Reflections and Meeting Notes

Click bold text to open up project

  • Collaborative Discussion 1:
  •  

    This project focused on analyzing the role of IoT in big data systems. I explored the opportunities for real-time decision-making in transportation, while also addressing the risks of security breaches and data complexity.

    The project taught me to balance technical expertise with an understanding of broader risks and opportunities. The feedback highlighted gaps in my discussion around security, which I now recognize as a critical aspect in the IoT context. My peers' input on data validation and processing architectures broadened my perspective, helping me improve my analysis.

    Moving forward, I will integrate stronger security considerations into my work, particularly as we deal with IoT data for fleet management. I have started researching advanced data processing architectures like Lambda to handle the high volumes of IoT data more effectively.

     
  • Mathematics Test:
  •  

    This test assessed my ability to apply Python methodologies for data wrangling. While I achieved a high score, I discovered that I lacked familiarity with certain modern terminologies.

    This experience underscored the importance of staying up-to-date with industry terminology and techniques. Although I have significant professional experience, I learned that continuous education is essential for maintaining a competitive edge in data science.

    I have since committed to ongoing learning, through my Masters and podcasts like Super Data Science to ensure i remain current with emerging trends and terminologies.

     
  • Team Project:
  •  

    I contributed to the design of a scalable database for a ride-hailing company, focusing on the logical structure and data pipeline. However, the project lacked depth in critical evaluation, particularly in comparing SQL and NoSQL systems.

    I realized the importance of providing a thorough critical analysis when recommending a system. The feedback emphasized that without comparing the pros and cons of each system, we missed an opportunity to present a balanced view to the client. The need for improved referencing also became clear.

    I applied this lesson in Unit 11, ensuring that my executive summary included a more detailed comparison of SQL and NoSQL systems, supported by academic references. This approach helped me strengthen the overall quality of the report, and I am confident that the improved critical analysis will lead to a better result.

     
  • Individual Executive:
  •  

    This project allowed me to apply the feedback from Unit 6. I focused on improving the critical evaluation of the database management system and addressed GDPR compliance more thoroughly.

    This project was a turning point in applying critical thinking to my work. I gained a deeper understanding of how to balance flexibility (NoSQL) with reliability (SQL), and the importance of compliance in database design.

    I feel more confident in critcally evaluating technologies and plan to apply these skills in my professional roles. When developing new projects and getting coworked to stakeholders involved, i will focus more on the integrity of what i present. This can be done using strong references, understanding what i am saying and most importantly, asking more knowledgable peers like in the unit 6 project that was incorporated into this project.

     

    Throughout this module, I have developed a robust understanding of key data science concepts, particularly in data wrangling, database design, and critical evaluation techniques. Each project allowed me to transition from theory to practical application, which has deepened my technical expertise. I learned to critically assess the strengths and limitations of different tools and methodologies, particularly in handling large datasets and designing scalable systems.

    One of the most significant aspects of my growth during this module was recognizing the importance of continuous learning. My performance in the Mathematics Test, despite being strong, revealed gaps in my familiarity with newer terminologies. This was a humbling reminder that the field of data science evolves rapidly, and it is essential to stay current with the latest trends and practices. As Morris (2021) notes, reflection is a key process in helping individuals critically assess their understanding, allowing them to adapt to the changing knowledge landscape. I have since committed to ongoing learning by listening to podcasts like "Super Data Science" and engaging with professional development courses. This has not only kept me updated but has also allowed me to approach projects with a more holistic and informed perspective.

    Another key takeaway from this module was the importance of critical evaluation and the need for a balanced approach when recommending systems or solutions. In the Unit 6 project, my team’s lack of in-depth critical evaluation led to a lower grade than expected, and I realized how essential it is to compare alternative systems and provide a thorough analysis backed by strong references. This was a turning point for me. Moving forward, I implemented these learnings in Unit 11, ensuring that my executive summary included a more detailed comparison of SQL and NoSQL systems, supported by academic references. This improved my confidence in presenting my findings and reinforced the need for integrity in my work.

    The collaborative aspect of the module also enhanced my professional skills, particularly in virtual teamwork. Working with peers in a remote environment taught me to communicate effectively, delegate tasks, and integrate feedback. The University of Essex Online emphasizes that reflective practice in teamwork is crucial for understanding how to improve cooperation and project outcomes. These are skills I will carry forward into my professional roles, where cross-functional collaboration is key to successful project delivery.

    Furthermore, I have gained a deeper appreciation for the importance of regulatory compliance in database design, especially in relation to GDPR. Through the Unit 11 project, I developed a clearer understanding of how data security and privacy considerations must be embedded into every stage of the design process. Reflective practice helps ensure that systems align with regulatory standards and fosters a proactive approach to compliance (University of Essex Online). I now see the importance of constantly re-evaluating our approaches to ensure we meet these requirements.

    In summary, this module has not only strengthened my technical capabilities but has also shifted my mindset towards continuous improvement. Reflection enables individuals to critically analyze their actions and make conscious efforts to improve (Morris, 2021). I now recognize the value of critical evaluation, the importance of referencing, and the need for ongoing learning. These insights will inform my future work, particularly in my role as a Data Scientist, where I aim to apply these skills to solve complex data challenges and drive innovation. Ultimately, this module has prepared me to approach future projects with greater rigor, ensuring that the solutions I propose are both technically sound and aligned with best practices in the field.

    Morris, C. 2021. Working with critical reflective pedagogies at a moment ofpost-truth populist authoritarianism. Avaliable from: https://www.tandfonline.com/doi/full/10.1080/13562517.2021.1965568. [Accessed 12 October 2024]

    University of Essex Online. A short guide to Reflective Writing. Avaliable from: https://www.my-course.co.uk/pluginfile.php/516814/mod_resource/content/2/A%20short%20guide%20to%20reflective%20writing.pdf. [Accessed 12 October 2024]

Professional Skills Matrix learnt and Action Plan

 

Skills Gained or Enhanced:

  • Data wrangling and pipeline development using Python and SQL.
  • Database design, including normalization and the application of both SQL and NoSQL DBMS.
  • Critical evaluation of database systems, with an emphasis on security, scalability, and compliance.
  • Collaboration and teamwork in virtual environments.
  • Referencing and integrity in the work i present.

Action Plan:

  • Short-Term: Continue refining my knowledge of NoSQL databases, particularly MongoDB, and explore more advanced topics in data pipeline optimization. If i understand how the data flows i can improve my role as a Data Scienitst.
  • Medium-Term: Focus on applying machine learning techniques to big data environments, particularly in relation to my work. This will allow me to gain more experience with big data and work with it daily.
  • Long-Term: Integrate the skills learned in this module into future projects, with an emphasis on balancing technical expertise with critical evaluation and regulatory compliance. Long term this will allow my work to have the utmost integrity.